fix(skills): cursor in CLI help, drop deprecated --force and stale judge vars in tutorial (#133) by Dongbumlee · Pull Request #148 · Azure/agentops

Dongbumlee · 2026-05-14T17:26:03Z

Closes #133.

Summary

Validated docs/tutorial-copilot-skills.md end-to-end. Three drifts fixed across one code file and one doc file.

Drifts fixed

#	Issue	Fix
1	CLI help for `agentops skills install --platform` listed only 'copilot, claude' but cursor is fully supported in `services/skills.py` (auto-detection at line 130, register function `_register_cursor` at line 598, layout entry at line 36). Tutorial line 66-68 already correctly mentions cursor — help text was the outlier.	Updated `cli/app.py:533` to list all three.
2	Tutorial Section 2 used `agentops skills install --platform copilot --force`. `--force` is now deprecated ('skills are always overwritten with the latest version' per help text).	Dropped `--force` from the example.
3	Tutorial recommended setting `AZURE_OPENAI_ENDPOINT` + `AZURE_OPENAI_DEPLOYMENT` alongside `AZURE_AI_MODEL_DEPLOYMENT_NAME`. Stale after PR #141 — deployment-only override now works against the Foundry project endpoint.	Simplified to `AZURE_AI_MODEL_DEPLOYMENT_NAME` only; added a one-liner noting when the `AZURE_OPENAI_*` pair is still required (separate judge resource).

Verification (clean /tmp)

Command	Outcome
`agentops skills install --platform copilot`	✅ creates `.github/copilot-instructions.md` + 6 SKILL.md files (all 7 listed in tutorial)
`agentops skills install --platform claude`	✅ creates `.claude/commands/agentops-*.md` (6 files)
`agentops skills install --platform cursor`	✅ creates `.github/skills/...` + `.cursor/rules/agentops.mdc`
`agentops skills install --platform copilot --force`	✅ still works (back-compat); --help text correctly marks it deprecated
`agentops agent analyze --help`	✅ `--severity-fail critical` exists as documented

Tests

Full suite: 290 passed, 1 skipped (no test changes — code change is a 1-line help-text update; tutorial fixes are doc-only).

Note for reviewers

Branched off fix/issue-132-baseline-comparison-validation (PR #147). Once that merges, this PR's diff against develop reduces to the 3 fixes here.

Also: issue #131 (conversational agent) was parked with a detailed comment — AgentOps does not implement multi-turn / history / extra_fields cross-row state; the tutorial differentiator is absent. Resuming after the product call.

Azure OpenAI's GPT-5 and o-series reasoning models reject the legacy 'max_tokens' parameter and require 'max_completion_tokens'. The azure-ai-evaluation SDK only switches its built-in evaluators (Coherence, Fluency, Similarity, etc.) to the new parameter when it is constructed with is_reasoning_model=True. This change auto-detects reasoning-model deployments by name and passes is_reasoning_model=True to each evaluator's constructor, so users can judge with gpt-5.x, o1, o3, or o4 deployments without manual config. Detection pattern: deployment names starting with gpt-5, gpt5, o1, o3, or o4 (case-insensitive). Override with the env var AGENTOPS_EVALUATOR_REASONING_MODEL when an alias hides the real model family. Also documents the model-direct judge defaults: when only AZURE_AI_FOUNDRY_PROJECT_ENDPOINT is set, the judge defaults to the target deployment and the endpoint is derived from the Foundry project URL.

Previously, switching the AI-assisted evaluator judge to a different deployment required setting both AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_DEPLOYMENT - even when the judge lived in the same Foundry project as the target. Users had to manually compute the classic Azure OpenAI endpoint URL. Now AZURE_OPENAI_DEPLOYMENT (or AZURE_AI_MODEL_DEPLOYMENT_NAME) alone is sufficient when AZURE_AI_FOUNDRY_PROJECT_ENDPOINT is set: AgentOps reuses the Foundry-derived data-plane endpoint. Users with a fully separate Azure OpenAI judge resource still set both vars (unchanged). Endpoint-only overrides remain rejected so AgentOps never silently judges with the wrong deployment. Also documents the deployment-name lookup tip (Foundry suffixes deployment names with random IDs, e.g. gpt-4.1-443723) in the model-direct tutorial. Refs #126

…e prereqs Drift surfaced while validating the minimal quickstart tutorial (#125): - 'agentops init' now creates three files (added .gitignore at the project root) - doc previously said two. - The evaluator override YAML example used plain strings, but the AgentOpsConfig schema requires '- name: <ClassName>' entries. Updated the snippet so users can copy/paste it without a ValidationError. - AI-assisted evaluator prerequisites no longer mandate AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_DEPLOYMENT for Foundry targets - the judge defaults to the target deployment after PR #141. Marked the env vars as 'separate judge resource only'. Refs #125

…orted legacy agent claim Validated docs/tutorial-basic-foundry-agent.md end-to-end against a freshly-created named agent (qa-bot:1) on the new Foundry experience and corrected three drifts: - Dataset path was '.agentops/data/smoke-agent-tools.jsonl', a file that 'agentops init' has never created. Use the actual seed at '.agentops/data/smoke.jsonl'. - Judge model was sourced from AZURE_OPENAI_DEPLOYMENT, but Part 2 never set it. After PR #141 the deployment-only override is enough, so Part 2 now exports AZURE_AI_MODEL_DEPLOYMENT_NAME and Part 3 reflects that. - Tutorial claimed 'AgentOps handles both [named and legacy asst_*] agents. Named agents use the Foundry Responses API; legacy agents use the Threads API.' Neither AgentOps nor the new Foundry Responses API supports asst_* legacy agents today: classify_agent() rejects the bare ID, and the Responses API requires a versioned name even for migrated assistants. Reframed the tutorial as 'named, versioned agents only', linked to the recreate-as-named-agent path, and tracked the gap in #143. Refs #127

…ents only Validated docs/tutorial-rag.md end-to-end against a Foundry named agent (qa-bot:1) on the new Foundry experience: 5 rows x 9 evaluators ran cleanly, all thresholds passed. Three doc fixes informed by runtime validation: - Part 1 step 3: the knowledge-base/file-search tool is correctly listed as a Foundry feature, but is **optional** for this tutorial. The evaluator only sees the agent's final answer, not its internal retrieval, so a plain prompt agent works equally well for the eval loop. Reworded as optional with a note about when production users would want it. - Part 4: 'context' bullet now describes the field as 'reference passages' instead of 'retrieved document context' (more accurate - AgentOps does not capture runtime retrieval). Replaced the vague 'populate with retrieved passages' tip with two concrete workflows (manual reference passages vs pre-script retrieval). - Notes: added named-agents-only constraint, mirroring the basic-foundry-agent tutorial. Refs #128. Capture-retrieval gap tracked in #145.

…gn doc The baseline-comparison tutorial claimed the shipped PR workflow 'already supports' baseline comparison, but the generated agentops-pr.yml ran 'agentops eval run --config <cfg>' with no --baseline flag. Users had to manually edit the workflow. Make the doc claim true: the PR workflow now auto-detects .agentops/baseline/results.json and passes --baseline when present. Without that file, behaviour is unchanged (no baseline, no comparison). - src/agentops/templates/workflows/agentops-pr.yml: new shell guard in the 'Run AgentOps eval' step. - docs/tutorial-baseline-comparison.md: section 4 now reflects the auto-detection (drop the file, no workflow edit needed). - tests/unit/test_cicd.py: assert the generated workflow contains the baseline-detection block. Verified end-to-end: ran the baseline comparison flow in /tmp and confirmed results.json carries the documented top-level 'comparison' block (baseline_path, baseline_started_at, baseline_overall_passed, metrics[], rows[]). Refs #132

…force and stale judge env vars from tutorial Validated docs/tutorial-copilot-skills.md end-to-end: `agentops skills install` produces all 7 documented files (.github/copilot-instructions.md + 6 SKILL.md) for copilot, equivalent .claude/commands/*.md for claude, and .cursor/rules/agentops.mdc for cursor. Watchdog commands (agent analyze, agent serve) match the tutorial. Three drifts fixed: - CLI help for 'agentops skills install --platform' listed 'copilot, claude' but cursor is fully supported in services/skills.py (auto-detection, register function, layout). Updated app.py:533 help text. - Tutorial Section 2 invoked 'agentops skills install --platform copilot --force'. --force is now deprecated ('skills are always overwritten with the latest version' per the help text); dropped it from the example. - Tutorial Section 'Set local evaluator variables' recommended exporting AZURE_OPENAI_ENDPOINT + AZURE_OPENAI_DEPLOYMENT alongside AZURE_AI_MODEL_DEPLOYMENT_NAME. Stale after #141 (deployment-only override now works against the Foundry project endpoint). Simplified to AZURE_AI_MODEL_DEPLOYMENT_NAME only, with a note about when the AZURE_OPENAI_* pair is still needed (separate judge resource). Refs #133.

Dongbumlee · 2026-05-14T17:35:33Z

Closing to re-run the validation against the current develop. develop has advanced ~10+ commits (including significant changes to runtime.py, tutorials, and CLI surface) since this PR was opened, and the architecture/text we built on is no longer the baseline.

Will reopen scoped, smaller PRs per backlog issue after re-running each tutorial against the latest develop.

Tracked in plan.md (session state).

DB Lee added 7 commits May 12, 2026 09:25

Dongbumlee closed this May 14, 2026

This was referenced May 14, 2026

CLI help: 'agentops skills install --platform' should list cursor (currently shows only copilot, claude) #157

Open

docs(copilot-skills): drop deprecated --force flag and deduplicate judge env vars (#133) #158

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(skills): cursor in CLI help, drop deprecated --force and stale judge vars in tutorial (#133)#148

fix(skills): cursor in CLI help, drop deprecated --force and stale judge vars in tutorial (#133)#148
Dongbumlee wants to merge 7 commits into
developfrom
fix/issue-133-copilot-skills-validation

Dongbumlee commented May 14, 2026

Uh oh!

Dongbumlee commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Dongbumlee commented May 14, 2026

Summary

Drifts fixed

Verification (clean /tmp)

Tests

Note for reviewers

Uh oh!

Dongbumlee commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant